Add shared monitor dispatcher scheduler#185
Conversation
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 107ca2e323
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| "strategy_profile": runtime_target.get("strategy_profile") or source.get("STRATEGY_PROFILE"), | ||
| "account_scope": runtime_target.get("account_scope") or source.get("ACCOUNT_REGION") or source.get("account_scope"), | ||
| "runtime_target_enabled": source.get("runtime_target_enabled", source.get("RUNTIME_TARGET_ENABLED", True)), | ||
| "scheduler": runtime_target.get("scheduler") if isinstance(runtime_target.get("scheduler"), dict) else {}, |
There was a problem hiding this comment.
Populate fallback monitor schedules
When CLOUD_RUN_SERVICE_TARGETS_JSON contains the minimal target entries supported elsewhere (for example {"service":"svc-a"}), or when the generated RUNTIME_TARGET_JSON has no scheduler, this writes {} into MONITOR_DISPATCH_TARGETS_JSON. The new dispatcher only calls /probe and /dry-run when probe_time/precheck_time are present, and this workflow later deletes the legacy probe/precheck jobs, so default-configured services silently stop receiving monitor checks despite the CLOUD_SCHEDULER_PROBE_TIME/CLOUD_SCHEDULER_PRECHECK_TIME defaults. Please include the resolved fallback schedule in each monitor target before removing the old jobs.
Useful? React with 👍 / 👎.
| result = dispatch_due_monitors(load_monitor_targets()) | ||
| except Exception as exc: | ||
| return _handle_route_runtime_error(exc, route_label="POST /monitor-dispatch") | ||
| return json.dumps(result, ensure_ascii=False), 200, {"Content-Type": "application/json"} |
There was a problem hiding this comment.
Propagate dispatch failures to Cloud Scheduler
If any downstream monitor request times out, returns 403/500, or otherwise fails, dispatch_due_monitors reports {"ok": false} but this route still returns HTTP 200. In that scenario the shared Cloud Scheduler job treats the run as successful and won't retry or surface the dispatcher failure, even though no target-side notification may have happened (for example on IAM/network failures). Return a non-2xx status when result["ok"] is false so the scheduler can detect the failed dispatch.
Useful? React with 👍 / 👎.
Summary
Tests